Text Extraction from Pdf Image Using Enhanced Connected Component Labeling
نویسنده
چکیده
This paper presents a new technique that greatly increases the speed of the connected component labeling algorithm. We propose a system to extract the text from the PDF images. This paper describes the system design based on text extraction method concentrating on text extraction from PDF images by enhancing the traditional connected component labeling as modified connected component labeling that uses a components neighbor scan labeling approach derived from Akmal et al[9]. This method produced good performance in terms of accuracy and speed. The performance of the approach is demonstrated.
منابع مشابه
Extraction and Recognition of Text From Digital English Comic Image Using Median Filter
Text extraction from image is one of the complicated areas in digital image processing. Text characters entrenched in image represents a rich source of information for text retrieval application. It is a complex process to detect and recognize the text from comic image due to their various size, gray scale values, complex backgrounds and different styles of font. Text extraction process from co...
متن کاملSurvey on Optimized solution for efficient detection of Text from images
Text detection and recognition is a hot topic for researchers in the field of image processing. Text detection and extraction is performed in a four-step approach that consists of the pre-processing which include binarization and noise removal of an image, image segmentation using connected component analysis, feature extraction using variance generation and finally classification by choosing a...
متن کاملArbitrarily Oriented Scene Text Detection using SMSER and Connected component analysis
In this work, rotation invariant approach has been explored and an effective rotation invariant text detection system has been proposed. In this discrete wavelet transform has been used to get the multi-level feature extraction of the text region as vertical, horizontal and diagonal coefficients provide variation in edge pixels of the text scene image. Further this, detailed and approximation c...
متن کاملFeature Extraction Using a Chaincoded Contour Representation of Fingerprint Images
A feature extraction method using the chaincode representation of fingerprint ridge contours is presented for use by Automatic Fingerprint Identification Systems. The representation allows efficient image quality enhancement and detection of fine feature points called minutiae. Enhancement is accomplished by binarization and smoothing followed by estimation of the ridge contours field of flow. ...
متن کاملOCR for Handwritten Kannada Language Script
The optical character recognition (OCR) is the process of converting textual scanned image into a computer editable format. The proposed OCR system is for complex handwritten Kannada characters. One of the major challenges faced by Kannada OCR system is recognition of handwritten text from an image. The input text image is subjected to preprocessing and then converted into binary image. Segment...
متن کامل